AITopics | instrument recognition

Collaborating Authors

instrument recognition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Persian Musical Instruments Classification Using Polyphonic Data Augmentation

Esfangereh, Diba Hadi, Sameti, Mohammad Hossein, Moridani, Sepehr Harfi, Javidpour, Leili, Baghshah, Mahdieh Soleymani

arXiv.org Artificial IntelligenceNov-11-2025

Musical instrument classification is essential for music information retrieval (MIR) and generative music systems. However, research on non-Western traditions, particularly Persian music, remains limited. We address this gap by introducing a new dataset of isolated recordings covering seven traditional Persian instruments, two common but originally non-Persian instruments (i.e., violin, piano), and vocals. We propose a culturally informed data augmentation strategy that generates realistic polyphonic mixtures from monophonic samples. Using the MERT model (Music undERstanding with large-scale self-supervised Training) with a classification head, we evaluate our approach with out-of-distribution data which was obtained by manually labeling segments of traditional songs. On real-world polyphonic Persian music, the proposed method yielded the best ROC-AUC (0.795), highlighting complementary benefits of tonal and temporal coherence. These results demonstrate the effectiveness of culturally grounded augmentation for robust Persian instrument recognition and provide a foundation for culturally inclusive MIR and diverse music generation systems.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.05717

Country: Asia > Middle East > Iran (0.28)

Genre: Research Report (0.85)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

A Hierarchical Deep Learning Approach for Minority Instrument Detection

Sechet, Dylan, Bugiotti, Francesca, Kowalski, Matthieu, d'Hérouville, Edouard, Langiewicz, Filip

arXiv.org Artificial IntelligenceJun-27-2025

Identifying instrument activities within audio excerpts is vital in music information retrieval, with significant implications for music cataloging and discovery. Prior deep learning endeavors in musical instrument recognition have predominantly emphasized instrument classes with ample data availability. Recent studies have demonstrated the applicability of hierarchical classification in detecting instrument activities in orchestral music, even with limited fine-grained annotations at the instrument level. Based on the Hornbostel-Sachs classification, such a hierarchical classification system is evaluated using the MedleyDB dataset, renowned for its diversity and richness concerning various instruments and music genres. This work presents various strategies to integrate hierarchical structures into models and tests a new class of models for hierarchical music prediction. This study showcases more reliable coarse-level instrument detection by bridging the gap between detailed instrument identification and group-level recognition, paving the way for further advancements in this domain.

artificial intelligence, instrument, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.21167

Country: Europe > United Kingdom (0.16)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MIRFLEX: Music Information Retrieval Feature Library for Extraction

Chopra, Anuradha, Roy, Abhinaba, Herremans, Dorien

arXiv.org Artificial IntelligenceNov-1-2024

This paper introduces an extendable modular system that compiles a range of music feature extraction models to aid music information retrieval research. The features include musical elements like key, downbeats, and genre, as well as audio characteristics like instrument recognition, vocals/instrumental classification, and vocals gender detection. The integrated models are state-of-the-art or latest open-source. The features can be extracted as latent or post-processed labels, enabling integration into music applications such as generative music, recommendation, and playlist generation. The modular design allows easy integration of newly developed systems, making it a good benchmarking and comparison tool. This versatile toolkit supports the research community in developing innovative solutions by providing concrete musical features.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.00469

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Singapore (0.05)
Europe > Spain > Andalusia > Málaga Province > Málaga (0.05)

Genre: Research Report (0.69)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

I can listen but cannot read: An evaluation of two-tower multimodal systems for instrument recognition

Vasilakis, Yannis, Bittner, Rachel, Pauwels, Johan

arXiv.org Artificial IntelligenceJul-25-2024

Music two-tower multimodal systems integrate audio and text modalities into a joint audio-text space, enabling direct comparison between songs and their corresponding labels. These systems enable new approaches for classification and retrieval, leveraging both modalities. Despite the promising results they have shown for zero-shot classification and retrieval tasks, closer inspection of the embeddings is needed. This paper evaluates the inherent zero-shot properties of joint audio-text spaces for the case-study of instrument recognition. We present an evaluation and analysis of two-tower systems for zero-shot instrument recognition and a detailed analysis of the properties of the pre-joint and joint embeddings spaces. Our findings suggest that audio encoders alone demonstrate good quality, while challenges remain within the text encoder or joint space projection. Specifically, two-tower systems exhibit sensitivity towards specific words, favoring generic prompts over musically informed ones. Despite the large size of textual encoders, they do not yet leverage additional textual context or infer instruments accurately from their descriptions. Lastly, a novel approach for quantifying the semantic meaningfulness of the textual space leveraging an instrument ontology is proposed. This method reveals deficiencies in the systems' understanding of instruments and provides evidence of the need for fine-tuning text encoders on musical data.

classification, representation, similarity, (15 more...)

arXiv.org Artificial Intelligence

2407.18058

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)

Add feedback

Self-refining of Pseudo Labels for Music Source Separation with Noisy Labeled Data

Koo, Junghyun, Chae, Yunkee, Jeon, Chang-Bin, Lee, Kyogu

arXiv.org Artificial IntelligenceJul-24-2023

Music source separation (MSS) faces challenges due to the limited availability of correctly-labeled individual instrument tracks. With the push to acquire larger datasets to improve MSS performance, the inevitability of encountering mislabeled individual instrument tracks becomes a significant challenge to address. This paper introduces an automated technique for refining the labels in a partially mislabeled dataset. Our proposed self-refining technique, employed with a noisy-labeled dataset, results in only a 1% accuracy degradation in multi-label instrument recognition compared to a classifier trained on a clean-labeled dataset. The study demonstrates the importance of refining noisy-labeled data in MSS model training and shows that utilizing the refined dataset leads to comparable results derived from a clean-labeled dataset. Notably, upon only access to a noisy dataset, MSS models trained on a self-refined dataset even outperform those trained on a dataset refined with a classifier trained on clean labels.

artificial intelligence, dataset, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2307.12576

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Europe > Italy > Lombardy > Milan (0.04)

Genre: Research Report (0.64)

Industry:

Media > Music (0.70)
Leisure & Entertainment (0.70)
Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Show Me the Instruments: Musical Instrument Retrieval from Mixture Audio

Kim, Kyungsu, Park, Minju, Joung, Haesun, Chae, Yunkee, Hong, Yeongbeom, Go, Seonghyeon, Lee, Kyogu

arXiv.org Artificial IntelligenceNov-15-2022

As digital music production has become mainstream, the selection of appropriate virtual instruments plays a crucial role in determining the quality of music. To search the musical instrument samples or virtual instruments that make one's desired sound, music producers use their ears to listen and compare each instrument sample in their collection, which is time-consuming and inefficient. In this paper, we call this task as Musical Instrument Retrieval and propose a method for retrieving desired musical instruments using reference music mixture as a query. The proposed model consists of the Single-Instrument Encoder and the Multi-Instrument Encoder, both based on convolutional neural networks. The Single-Instrument Encoder is trained to classify the instruments used in single-track audio, and we take its penultimate layer's activation as the instrument embedding. The Multi-Instrument Encoder is trained to estimate multiple instrument embeddings using the instrument embeddings computed by the Single-Instrument Encoder as a set of target embeddings. For more generalized training and realistic evaluation, we also propose a new dataset called Nlakh. Experimental results showed that the Single-Instrument Encoder was able to learn the mapping from the audio signal of unseen instruments to the instrument embedding space and the Multi-Instrument Encoder was able to extract multiple embeddings from the mixture of music and retrieve the desired instruments successfully. The code used for the experiment and audio samples are available at: https://github.com/minju0821/musical_instrument_retrieval

artificial intelligence, instrument, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2211.07951

Country: Asia > South Korea > Seoul > Seoul (0.05)

Genre: Research Report > New Finding (0.87)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Music Instrument Classification Reprogrammed

Chen, Hsin-Hung, Lerch, Alexander

arXiv.org Artificial IntelligenceNov-15-2022

The performance of approaches to Music Instrument Classification, a popular task in Music Information Retrieval, is often impacted and limited by the lack of availability of annotated data for training. We propose to address this issue with "reprogramming," a technique that utilizes pre-trained deep and complex neural networks originally targeting a different task by modifying and mapping both the input and output of the pre-trained model. We demonstrate that reprogramming can effectively leverage the power of the representation learned for a different task and that the resulting reprogrammed system can perform on par or even outperform state-of-the-art systems at a fraction of training parameters. Our results, therefore, indicate that reprogramming is a promising technique potentially applicable to other tasks impeded by data scarcity.

machine learning, natural language, pre-trained model, (18 more...)

arXiv.org Artificial Intelligence

2211.08379

Genre:

Research Report > Promising Solution (0.66)
Research Report > New Finding (0.66)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

ChMusic: A Traditional Chinese Music Dataset for Evaluation of Instrument Recognition

Gong, Xia, Zhu, Yuxiang, Zhu, Haidi, Wei, Haoran

arXiv.org Artificial IntelligenceAug-18-2021

Musical instruments recognition is a widely used application for music information retrieval. As most of previous musical instruments recognition dataset focus on western musical instruments, it is difficult for researcher to study and evaluate the area of traditional Chinese musical instrument recognition. This paper propose a traditional Chinese music dataset for training model and performance evaluation, named ChMusic. This dataset is free and publicly available, 11 traditional Chinese musical instruments and 55 traditional Chinese music excerpts are recorded in this dataset. Then an evaluation standard is proposed based on ChMusic dataset. With this standard, researchers can compare their results following the same rule, and results from different researchers will become comparable.

dataset, instrument recognition, recognition, (11 more...)

arXiv.org Artificial Intelligence

2108.0847

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > Texas (0.04)
North America > United States > Mississippi (0.04)

Genre: Research Report (0.40)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Augmentation Methods on Monophonic Audio for Instrument Classification in Polyphonic Music

Kratimenos, Agelos, Avramidis, Kleanthis, Garoufis, Christos, Zlatintsi, Athanasia, Maragos, Petros

arXiv.org Machine LearningNov-27-2019

Instrument classification is one of the fields in Music Information Retrieval (MIR) that has attracted a lot of research interest. However, the majority of that is dealing with monophonic music, while efforts on polyphonic material mainly focus on predominant instrument recognition or multi-instrument recognition for entire tracks. We present an approach for instrument classification in polyphonic music using monophonic training data that involves mixing-augmentation methods. Specifically, we experiment with pitch and tempo-based synchronization, as well as mixes of tracks with similar music genres. Further, a custom CNN model is proposed, that uses the augmented training data efficiently and a plethora of suitable evaluation metrics are discussed as well. The tempo-sync and genre techniques stand out, achieving an 81% label ranking average precision accuracy, detecting up to 9 instruments in over 2300 testing tracks.

dataset, instrument, music, (14 more...)

arXiv.org Machine Learning

1911.12505

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Spain > Andalusia > Málaga Province > Málaga (0.04)
Europe > Greece (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)

Genre: Research Report (0.50)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Musical Instrument Recognition Using Their Distinctive Characteristics in Artificial Neural Networks

Toghiani-Rizi, Babak, Windmark, Marcus

arXiv.org Machine LearningMay-14-2017

In this study an Artificial Neural Network was trained to classify musical instruments, using audio samples transformed to the frequency domain. Different features of the sound, in both time and frequency domain, were analyzed and compared in relation to how much information that could be derived from that limited data. The study concluded that in comparison with the base experiment, that had an accuracy of 93.5%, using the attack only resulted in 80.2% and the initial 100 Hz in 64.2%.

artificial intelligence, instrument, machine learning, (16 more...)

arXiv.org Machine Learning

1705.04971

Country: Europe (0.14)

Genre:

Research Report > New Finding (0.68)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback